我们提出了一种方法,以利用基础马尔可夫决策过程的固有鲁棒性来减少多代理学习系统中所需的通信。我们计算所谓的鲁棒性替代功能(离线),这使代理商保守地表明其状态测量在需要更新系统中的其他代理之前可能会偏离多远。这导致了完全分布的决策功能,使代理可以决定何时需要更新其他人。我们根据获得的奖励总和来得出所得系统的最佳性界限,并显示这些界限是设计参数的函数。此外,我们扩展了从数据中学到鲁棒性替代功能的情况下的结果,并提出了实验结果,证明了代理之间的通信事件显着降低。
translated by 谷歌翻译
我们提出了一种方法来减少由事件触发控制(ETC)技术的分布式Q学习系统所需信息的通信。我们考虑在Markov决策过程(MDP)上的分布式Q学习问题的基线情景。在基于事件的方法之后,N代理商探索MDP并仅在必要时将体验传达给中央学习者,这执行了Actor Q函数的更新。我们设计了一个基于事件的分布式Q学习系统(EBD-Q),并在vanilla Q学习算法方面推出了收敛保证。我们提出了实验结果,示出了基于事件的通信导致这种分布式系统中的数据传输速率大幅度降低。此外,我们讨论基于事件的方法对所研究的学习过程的基于事件的方法以及它们如何应用于更复杂的多代理系统。
translated by 谷歌翻译
我们提出了群生物设计灵感觅食基于蚂蚁信息素的部署,其中假设群有非常有限的能力。机器人不需要全局或相对位置测量和群充分分散,需要在地方没有基础设施。此外,该系统只需要在机器人上的网络单跳通信,我们不做出关于通信图的连通性和信息与计算传输的任何假设是可扩展的与代理的数量。这是通过在群充当觅食让剂或作为导向剂(信标)来完成。我们目前的实验结果计算了ELISA的3个机器人的一个模拟器群,并展示如何在群自行组织了一个未知的环境中解决问题觅食,汇聚成各地的最短路径轨迹。最后,我们讨论这样一个系统的局限性,并提出了觅食的效率如何可以增加。
translated by 谷歌翻译
The performance of inertial navigation systems is largely dependent on the stable flow of external measurements and information to guarantee continuous filter updates and bind the inertial solution drift. Platforms in different operational environments may be prevented at some point from receiving external measurements, thus exposing their navigation solution to drift. Over the years, a wide variety of works have been proposed to overcome this shortcoming, by exploiting knowledge of the system current conditions and turning it into an applicable source of information to update the navigation filter. This paper aims to provide an extensive survey of information aided navigation, broadly classified into direct, indirect, and model aiding. Each approach is described by the notable works that implemented its concept, use cases, relevant state updates, and their corresponding measurement models. By matching the appropriate constraint to a given scenario, one will be able to improve the navigation solution accuracy, compensate for the lost information, and uncover certain internal states, that would otherwise remain unobservable.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译
In the present work we propose an unsupervised ensemble method consisting of oblique trees that can address the task of auto-encoding, namely Oblique Forest AutoEncoders (briefly OF-AE). Our method is a natural extension of the eForest encoder introduced in [1]. More precisely, by employing oblique splits consisting in multivariate linear combination of features instead of the axis-parallel ones, we will devise an auto-encoder method through the computation of a sparse solution of a set of linear inequalities consisting of feature values constraints. The code for reproducing our results is available at https://github.com/CDAlecsa/Oblique-Forest-AutoEncoders.
translated by 谷歌翻译
When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Deep learning models are known to put the privacy of their training data at risk, which poses challenges for their safe and ethical release to the public. Differentially private stochastic gradient descent is the de facto standard for training neural networks without leaking sensitive information about the training data. However, applying it to models for graph-structured data poses a novel challenge: unlike with i.i.d. data, sensitive information about a node in a graph cannot only leak through its gradients, but also through the gradients of all nodes within a larger neighborhood. In practice, this limits privacy-preserving deep learning on graphs to very shallow graph neural networks. We propose to solve this issue by training graph neural networks on disjoint subgraphs of a given training graph. We develop three random-walk-based methods for generating such disjoint subgraphs and perform a careful analysis of the data-generating distributions to provide strong privacy guarantees. Through extensive experiments, we show that our method greatly outperforms the state-of-the-art baseline on three large graphs, and matches or outperforms it on four smaller ones.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译